Techniques for Improving the Cache Performance in Parallel Applications

نویسندگان

  • Inbum Jung
  • Joonwon Lee
چکیده

The performance of parallel programs has suffered from memory access latencies induced by cache misses. In this paper, to investigate the causes of these cache misses, data parallel applications were executed on shared memory multiprocessors. The experiment showed that cache conflict misses occupied most of the cache misses. This was due to the cross interference among the grains composed of the part of data arrays. To address this problem, a tailored grain size was devised from the underlying cache architecture. Besides the interference among grains, cache performance was sensitive to the way data were constructed. To make data structure for exhibiting good cache behavior, a stride merging-arrays method was presented. This method entailed the reduction of cache conflict misses and reduced the useless prefetches in cache lines with multiple words. Simulation results show that these techniques may enhance the performance of parallel applications due to the improved cache performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Coupling: Case Studies for Improving the Performance of Scientific Applications

Traditional performance optimization techniques have focused on nding the kernel in an application that is the most time consuming and attempting to optimize it. In this paper we focus on an optimization technique with a more global perspective of the application. In particular, we present a methodology for measuring the interaction, or coupling, between kernels within an application and descri...

متن کامل

Improve Replica Placement in Content Distribution Networks with Hybrid Technique

The increased using of the Internet and its accelerated growth leads to reduced network bandwidth and the capacity of servers; therefore, the quality of Internet services is unacceptable for users while the efficient and effective delivery of content on the web has an important role to play in improving performance. Content distribution networks were introduced to address this issue. Replicatin...

متن کامل

Multi Level Caching and Anticipated Parallel Processing-Based Algorithm for Improving the Performance of the Distributed File System

Large amount of data is getting generated due to the extensive use of web applications by billions of users around the globe. The organizations which has deployed web applications are pondering over solutions for scalable storage and faster access of large data. Distributed file systems (DFSs) have been emerged as efficient storage solutions so that the data can be stored and accessed efficient...

متن کامل

Review of techniques for improving the uniformity of dose distribution in total body irradiation (TBI) with parallel – opposed anterior and posterior geometry

      Total body irradiation (TBI) is a kind of external beam radiotherapy which is used in conjunction with chemotherapy with the purpose of immunosuppression before bone marrow transplantation. As recommended by AAPM dose distribution uniformity in TBI is very important and dose variation must be within ±10% of prescription dose. Patients treatment geometry for TBI techniques fall into two co...

متن کامل

Improving Performance for Software MPEG Players

In this paper, we present a technique for improving the cache memory performance for software MPEG players. We motivate this technique by first presenting a characterization of cache behavior for mpeg play and mpeg2play MPEG applications. We then apply two hardware-based prefetching techniques to improve the cache memory performance. Previously published work has focused on applications of pref...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999